── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ rvest::guess_encoding() masks readr::guess_encoding()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)library(ggplot2)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Introduction
This exam project is constructed with 24 Exercises that need to be solved and argumented.
important info
Some steps have set the output to hide so only end results is visible.
Exercise 1
Read in the data in the file world_population.csv and select/deselect and rename columns so you end up with a tibble (tbl) named wpop_full with 266 rows and 65 columns with names as shown in the output below (the last column being 2022). Hint: Use skip in read_csv to avoid header lines not containing data or names of data.
import data from csv
data_dir <- here::here("world_population.csv") #set the directory to were the project and dataset is usedwpop_raw_dat =suppressMessages(read_csv(data_dir, skip =3)) # Read csv file and skip the last update info and empty rows
Add column names and renaming
colnames(wpop_raw_dat) <-as.character(unlist(wpop_raw_dat[1,])) # Make the first row the column names# next two steps are needed to remove exsta column added when making column nameswpop_raw_dat <- wpop_raw_dat[-1, ] # Remove the first rowrownames(wpop_raw_dat) <-NULL# Reset row nameshead(wpop_raw_dat) # Confirm that data set now have column nameswpop_raw_dat <- wpop_raw_dat |># Rename of duplicated meaning in column namesrename(country =`Country Name`,code =`Country Code` )head(wpop_raw_dat) # Confirm tha data structure is correct
Remove unwanted columns
wpop_raw_dat <- wpop_raw_dat |># remove column "indicator Name" and "Indicator Code".select(-`Indicator Name`, -`Indicator Code`)#head(wpop_raw_dat) # confirm that columns have been removed.wpop_full <- wpop_raw_dat |># Remove year 2023 so that 2022 is last columnselect(-`2023`)
Use the package rvest to read in the list of country codes from the main table at ISO 3166-1 on Wikipedia and select/deselect and rename columns so you end up with a tibble (tbl) named iso_codes_all with 249 rows and 3 columns with names as shown in the output below.
Web scraping data from Wikipedia
url <-"https://en.wikipedia.org/wiki/ISO_3166-1"webpage <-read_html(url)tables <- webpage |>html_nodes("table") |># Specify the CSS selector for the table. "html_nodes" takes all tables. "html_node" takes only the first.html_table() # Convert the HTML table to a data framehead(tables) # we need table number 2raw_dat <- tables[[2]] # number 2 table aka ISO 3166-1 tablehead(raw_dat) # OBS Afghanistan[c] the [C] means that the country is under that category "Naming and disputes" which is correct
remove and rename columns
colnames(raw_dat)raw_dat <- raw_dat |>rename(name =`English short name (using title case)`,iso3 =`Alpha-3 code`,independent =`Independent[b]` )colnames(raw_dat) #check rename was doneiso_codes_all <- raw_dat |>select(name, iso3, independent)
iso_codes_all result
OBS Afghanistan[c] the [C] means that the country is under that category “Naming and disputes” which is correct
iso_codes_all
# A tibble: 249 × 3
name iso3 independent
<chr> <chr> <chr>
1 Afghanistan[c] AFG Yes
2 Åland Islands ALA No
3 Albania ALB Yes
4 Algeria DZA Yes
5 American Samoa ASM No
6 Andorra AND Yes
7 Angola AGO Yes
8 Anguilla AIA No
9 Antarctica ATA No
10 Antigua and Barbuda ATG Yes
# ℹ 239 more rows
Exercise 3
Use filter() to extract the independent countries from iso_codes_all and save the result as iso_codes.
# A tibble: 194 × 3
name iso3 independent
<chr> <chr> <chr>
1 Afghanistan[c] AFG Yes
2 Albania ALB Yes
3 Algeria DZA Yes
4 Andorra AND Yes
5 Angola AGO Yes
6 Antigua and Barbuda ATG Yes
7 Argentina ARG Yes
8 Armenia ARM Yes
9 Australia AUS Yes
10 Austria AUT Yes
# ℹ 184 more rows
Exercise 4
Use a suitable join (and/or filter) command to make a dataset wpop only containing those rows of wpop_full which have a matching ISO country code in iso_codes:
# returns all rows from wpop_full that have a match in iso_codes.# based on the condition (by = c("code" = "iso3")).# but doesn't include columns from iso_codeswpop <-semi_join(wpop_full, iso_codes, by =c("code"="iso3"))wpop
Show the countries/areas which have the same ISO country code in both wpop and iso_codes but different (spellings of) country names.
join_data <-inner_join(wpop, iso_codes, by =c("code"="iso3"))different_spelling <-filter(join_data, country != name)result <- different_spelling |>select(name, code, country)result
# A tibble: 28 × 3
name code country
<chr> <chr> <chr>
1 Afghanistan[c] AFG Afghanistan
2 Bahamas BHS Bahamas, The
3 Bolivia, Plurinational State of BOL Bolivia
4 China[c] CHN China
5 Côte d'Ivoire CIV Cote d'Ivoire
6 Congo, Democratic Republic of the COD Congo, Dem. Rep.
7 Congo COG Congo, Rep.
8 Cyprus[c] CYP Cyprus
9 Egypt EGY Egypt, Arab Rep.
10 Micronesia, Federated States of FSM Micronesia, Fed. Sts.
# ℹ 18 more rows
Exercise 6
Use the package rvest to read in the list of countries with corresponding continent codes from the main table at List of sovereign states and dependent territories by continent and select/deselect and rename columns so you end up with a tibble (tbl) named continents with 253 rows and 2 columns with names as shown in the output below.
Important hint: you need convert = FALSE in html_table() to avoid the text string "NA" (North America) to be interpreted as missing data (Not Available).
Web scraping data from Wikipedia
url <-"https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_by_continent_(data_file)"webpage <-read_html(url)tables <- webpage |>html_nodes("table") |># Specify the CSS selector for the table.html_table(convert =FALSE)print(tables)raw_continents <- tables[[3]]raw_continents
# A tibble: 253 × 2
continent iso3
<chr> <chr>
1 AS AFG
2 EU ALB
3 AN ATA
4 AF DZA
5 OC ASM
6 EU AND
7 AF AGO
8 NA ATG
9 AS AZE
10 SA ARG
# ℹ 243 more rows
Exercise 7
Make a new dataset wpop2 by extending wpop with the extra column continent from the continents data (possibly using relocate() to move the continent column to the left to see it more clearly).
join_data <-inner_join(wpop, continents, by =c("code"="iso3")) #join wpop and continent table.wpop2 <- join_data |>relocate(continent, .before = country) # set continent to left of countrywpop2
# A tibble: 193 × 66
continent country code `1960` `1961` `1962` `1963` `1964` `1965` `1966`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AS Afghanistan AFG 8.62e6 8.79e6 8.97e6 9.16e6 9.36e6 9.57e6 9.78e6
2 AF Angola AGO 5.36e6 5.44e6 5.52e6 5.60e6 5.67e6 5.74e6 5.79e6
3 EU Albania ALB 1.61e6 1.66e6 1.71e6 1.76e6 1.81e6 1.86e6 1.91e6
4 EU Andorra AND 9.44e3 1.02e4 1.10e4 1.18e4 1.27e4 1.36e4 1.45e4
5 AS United Arab… ARE 1.33e5 1.41e5 1.49e5 1.57e5 1.65e5 1.74e5 1.83e5
6 SA Argentina ARG 2.03e7 2.07e7 2.10e7 2.14e7 2.17e7 2.21e7 2.24e7
7 AS Armenia ARM 1.90e6 1.97e6 2.04e6 2.11e6 2.17e6 2.23e6 2.30e6
8 NA Antigua and… ATG 5.53e4 5.62e4 5.70e4 5.78e4 5.87e4 5.96e4 6.06e4
9 OC Australia AUS 1.03e7 1.05e7 1.07e7 1.10e7 1.12e7 1.14e7 1.17e7
10 EU Austria AUT 7.05e6 7.09e6 7.13e6 7.18e6 7.22e6 7.27e6 7.32e6
# ℹ 183 more rows
# ℹ 56 more variables: `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
# `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
# `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
# `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
# `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
# `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …
Exercise 8
Use pivot_longer() to reshape wpop2 into “long format” with columns as shown below (in particular make sure year is a numeric variable) and call the resulting tibble pop_long.
pop_long <- wpop2 |>pivot_longer(cols =`1960`:`2022`, # Columns to pivot (year columns)names_to ="year", # New column for yearsvalues_to ="pop") # New column for population valuespop_long <- pop_long |>mutate(year =as.numeric(year)) # Convert year to numericpop_long
# A tibble: 12,159 × 5
continent country code year pop
<chr> <chr> <chr> <dbl> <dbl>
1 AS Afghanistan AFG 1960 8622466
2 AS Afghanistan AFG 1961 8790140
3 AS Afghanistan AFG 1962 8969047
4 AS Afghanistan AFG 1963 9157465
5 AS Afghanistan AFG 1964 9355514
6 AS Afghanistan AFG 1965 9565147
7 AS Afghanistan AFG 1966 9783147
8 AS Afghanistan AFG 1967 10010030
9 AS Afghanistan AFG 1968 10247780
10 AS Afghanistan AFG 1969 10494489
# ℹ 12,149 more rows
Exercise 9
Make a line plot showing the population over all the years in the data with one line per country with semi-transparent lines.
ggplot(pop_long, aes(x = year, y = pop, group = country)) +geom_line(alpha =0.2) +labs(title ="Population Growth Over Time by Country",x ="Year",y ="Population",) +scale_x_continuous(breaks =seq(1960, 2022, by =20))
Exercise 10
Use the code below to rescale each country’s population size to an population index which in 1 for every country in 1960. An index value of e.g. 2 would mean that the population size of that country has doubled since 1960.
Make a line plot showing the indexed population numbers over all the years in the data with one line per country with semi-transparent lines.
plt <-ggplot(pop_index_data, aes(x = year, y = pop_index, group = country)) +geom_line(alpha =0.2) +labs(title ="Index 2 = double pop size since 1960\nIndex 6 = 6 times the pop size since 1960",x ="Year",y ="Population index for each country") +scale_x_continuous(breaks =seq(1960, 2022, by =20))interactive_plot <-ggplotly(plt) # Make the plot interactiveinteractive_plot # Display the interactive plot
Exercise 12
Identify the two countries with extreme population indices and make a new line plot showing the indexed population numbers over all the years in the data with one line per country with semi-transparent lines without these two countries.
Identify two countries with extreme population indices
# method 1 use the interactive map above which showed that United Arab Emirates and Qatar are the extreme population indices.extreme_pop_indices <- pop_index_data |>filter(pop_index >60) |># 60 because only two countries in 2020 are above it and can be considered extreme.select(country, continent, code) |>distinct()extreme_pop_indices #United Arab Emirates and Qatar must be removed. This
# A tibble: 2 × 3
# Groups: country, continent, code [2]
country continent code
<chr> <chr> <chr>
1 United Arab Emirates AS ARE
2 Qatar AS QAT
pop_index_data <- pop_index_data |>filter(!country %in%c("United Arab Emirates", "Qatar"))
new plot without extreme popultation index
plt <-ggplot(pop_index_data, aes(x = year, y = pop_index, group = country)) +geom_line(alpha =0.2) +labs(title ="Without Qatar and UAE",x ="Year",y ="Population index for each country") +scale_x_continuous(breaks =seq(1960, 2022, by =20))interactive_plot <-ggplotly(plt) # Make the plot interactiveinteractive_plot # Display the interactive plot
Exercise 13
Run the following command and describe in a few words what the result growth_long is.
growth_long <- pop_long |>group_by(country,continent,code) |># groups country first then continent and finally code. reframe(pop_start = pop[1:(length(pop)-1)], # take all elements of "pop" except last.pop_end = pop[2:length(pop)], # take all elements of "pop" except first.growth =100*(pop_end-pop_start)/pop_start, # Calculate the growth rate between years "in %".end_year = year[2:length(year)]) # End of year of each interval "remove 1960"growth_long
# A tibble: 11,966 × 7
country continent code pop_start pop_end growth end_year
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Afghanistan AS AFG 8622466 8790140 1.94 1961
2 Afghanistan AS AFG 8790140 8969047 2.04 1962
3 Afghanistan AS AFG 8969047 9157465 2.10 1963
4 Afghanistan AS AFG 9157465 9355514 2.16 1964
5 Afghanistan AS AFG 9355514 9565147 2.24 1965
6 Afghanistan AS AFG 9565147 9783147 2.28 1966
7 Afghanistan AS AFG 9783147 10010030 2.32 1967
8 Afghanistan AS AFG 10010030 10247780 2.38 1968
9 Afghanistan AS AFG 10247780 10494489 2.41 1969
10 Afghanistan AS AFG 10494489 10752971 2.46 1970
# ℹ 11,956 more rows
# explain it wiht few words:# Population start year is 1960 which is the fist pop_start of each country.# pop_end is the population count by the end of the year. example 1960 - 1961 which is then the pop_stat in 1961.# growth is the percentage growth of the population each year.
Exercise 14
Make a line plot showing the population over all the years in the data with one line per country with semi-transparent lines.
ggplot(growth_long, aes(end_year, growth, group = country))+geom_line(alpha =0.2) +labs(title ="Population growth per year in percentage all countries",x ="Year",y ="Year-to-year population growth in pct") +scale_x_continuous(breaks =seq(1960, 2022, by =20))
Exercise 15
Make a similar graphic as above but with one panel/facet per continent.
ggplot(growth_long, aes(end_year, growth, group = country))+geom_line(alpha =0.2) +labs(title ="Population growth per year in percentage foreach country per continent",x ="Year",y ="Year-to-year population growth in pct") +scale_x_continuous(breaks =seq(1960, 2022, by =20)) +facet_wrap(~ continent) # Create one panel per continent
Exercise 16
For each country find both the largest positive and the smallest (most negative) growth over the years in the data so you end up with a tibble (tbl) named growth_range with 193 rows and 3 columns with names as shown in the output below. hint: group_by() and summarise() are your friends.
growth_range <- growth_long |>group_by(country) |># groups country. reframe(max_growth =signif(max(growth),3), # largest positive growth. signif rounds number to specified number of significant min_growth =signif(min(growth),3)) # smallest/negative growth.growth_range
# A tibble: 193 × 3
country max_growth min_growth
<chr> <dbl> <dbl>
1 Afghanistan 16.1 -10.7
2 Albania 3.17 -1.21
3 Algeria 4.93 1.36
4 Andorra 8.47 -3.16
5 Angola 3.83 0.698
6 Antigua and Barbuda 2.05 -0.577
7 Argentina 1.72 0.256
8 Armenia 3.54 -3.28
9 Australia 3.44 0.141
10 Austria 1.13 -0.265
# ℹ 183 more rows
Exercise 17
Find the 10 countries which have experienced the largest growth percentage of all at some point over the years in the data.
top10 <- growth_range |>arrange(desc(max_growth)) |># Sort by max growth in descending order (largest to smallest)head(10)top10
# A tibble: 10 × 3
country max_growth min_growth
<chr> <dbl> <dbl>
1 Qatar 21.4 -2.61
2 Kuwait 21 -24.2
3 Seychelles 20.8 -2.59
4 United Arab Emirates 19.9 0.782
5 Rwanda 18.1 -15.5
6 Afghanistan 16.1 -10.7
7 Lebanon 14.1 -8.83
8 Somalia 13.2 -4.52
9 Jordan 12.5 1.23
10 Oman 11.3 -1.29
Exercise 18
Find the 10 countries which have experienced the most negative growth percentage of all at some point over the years in the data.
bottom10 <- growth_range |>arrange(min_growth) |># Sort by min growth in with arranage() (smallest to largest)head(10)bottom10
Make a line plot showing the population over all the years in the data with different colours for each country represented in top10.
top10Years <- growth_long |>inner_join(top10, by ="country") |>select(country, growth, end_year) # select relevant columnsggplot(top10Years, aes(end_year, growth, group = country, color = country))+geom_line(alpha =0.6) +labs(title ="top 10 country growth per year in percentage",x ="end_Year",y ="Year-to-year population growth in pct") +scale_x_continuous(breaks =seq(1960, 2022, by =20))
Exercise 20
Use pivot_wider() to reshape growth_long to wide format with one column per year and call the result growth.
growth <- growth_long |>select(country, continent, code, end_year, growth) |># select relevant columns otherwise pop_start and end will ruin itpivot_wider(names_from = end_year,values_from = growth ) |>arrange(country)growth
Albania, Antigua and Barbuda, Argentina, Armenia, Australia, Austria, Azerbaijan, Barbados, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Cabo Verde, Canada, Chile, China, Croatia, Cuba, Cyprus, Czechia, Denmark, Dominica, El Salvador, Estonia, Fiji, Finland, France, Georgia, Germany, Greece, Grenada, Guyana, Hungary, Iceland, Ireland, Italy, Jamaica, Japan, Kazakhstan, Korea, Dem. People’s Rep., Korea, Rep., Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Mauritius, Moldova, Monaco, Montenegro, Myanmar, Nauru, Netherlands, New Zealand, North Macedonia, Norway, Palau, Poland, Portugal, Romania, Russian Federation, Samoa, San Marino, Serbia, Slovak Republic, Slovenia, Spain, Sri Lanka, St. Kitts and Nevis, St. Lucia, St. Vincent and the Grenadines, Suriname, Sweden, Switzerland, Thailand, Tonga, Trinidad and Tobago, Ukraine, United Kingdom, United States and Uruguay
Algeria, Angola, Bahamas, The, Bahrain, Bangladesh, Belize, Benin, Bhutan, Bolivia, Botswana, Brazil, Brunei Darussalam, Burkina Faso, Burundi, Cambodia, Cameroon, Central African Republic, Chad, Colombia, Comoros, Congo, Dem. Rep., Congo, Rep., Costa Rica, Cote d’Ivoire, Dominican Republic, Ecuador, Egypt, Arab Rep., Equatorial Guinea, Eritrea, Eswatini, Ethiopia, Gabon, Gambia, The, Ghana, Guatemala, Guinea, Guinea-Bissau, Haiti, Honduras, India, Indonesia, Iran, Islamic Rep., Iraq, Israel, Kenya, Kiribati, Kyrgyz Republic, Lao PDR, Lesotho, Liberia, Libya, Madagascar, Malawi, Malaysia, Maldives, Mali, Mauritania, Mexico, Micronesia, Fed. Sts., Mongolia, Morocco, Mozambique, Namibia, Nepal, Nicaragua, Niger, Nigeria, Pakistan, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Sao Tome and Principe, Saudi Arabia, Senegal, Sierra Leone, Singapore, Solomon Islands, Somalia, South Africa, South Sudan, Sudan, Syrian Arab Republic, Tajikistan, Tanzania, Timor-Leste, Togo, Tunisia, Turkiye, Turkmenistan, Tuvalu, Uganda, Uzbekistan, Vanuatu, Venezuela, RB, Viet Nam, Yemen, Rep., Zambia and Zimbabwe
Andorra, Djibouti, Jordan, Marshall Islands and Oman
Kuwait
Lebanon
Qatar and United Arab Emirates
Rwanda
Seychelles
Exercise 23
Use pivot_longer() to convert growth_clust to long format and plot growth as a function of time with a panel/facet for each cluster.
Use pivot_longer() to convert growth_clust to long format
plot growth as a function of time with a panel/facet for each cluster.
# remember year is <chr> so it needs to be converted to numericggplot(growth_clust_long, aes(x =as.numeric(year), y = growth, group = country)) +geom_line(alpha =0.6) +facet_wrap(~cluster) +# Create a panel for each clusterlabs(title ="Growth over time for each Cluster",x ="Year",y ="Year-to-year population growth in pct")
Exercise 24
The code below can be used to calculate average growth rates over several years.
agg_year <-5# define numbers of years to aggregate the growth ratesaggr_growth_long <- growth_long |>mutate(period=((end_year -min(end_year)) %/% agg_year) * agg_year +min(end_year)) |>group_by(period, code, country, continent) |>#needed to subsequent summarizationsummarise(avg_growth =mean(growth)) #calculate avg growth foreach group
`summarise()` has grouped output by 'period', 'code', 'country'. You can
override using the `.groups` argument.
aggr_growth_long
Use the clustering technique of the previous exercise to divide the data into different clusters. Experiment with several period lengths (aggregation years) and number of clusters and show results for at least one combination of agg_year and number of clusters. An example is given below.
Prepare the data for clustering
# Pivot the data to wide formataggr_growth_wide <- aggr_growth_long |>select(code, country, period, avg_growth) |>pivot_wider(names_from = period,values_from = avg_growth )head(aggr_growth_wide)# Remove non-numerical columns.aggr_growth_Numerical <-Filter(is.numeric, aggr_growth_wide) #only return numeric columns. head(aggr_growth_Numerical)
Calculate distance
# Compute the distance matrix using Euclidean distancedistances_aggr_growth <-dist(aggr_growth_Numerical, method ="euclidean")
Cluster algorithm
hc <-hclust(distances_aggr_growth)hc
label the clusters
clusters <-cutree(hc, k =9) # 9 groupsclusters
add cluster labels to aggr_growth_wide
# to add cluster to aggr_growth_wide we need to ungroup the dataframe so it add a cluster number foreach country and not group.aggr_growth_wide <- aggr_growth_wide |>ungroup() aggr_growth_clust <- aggr_growth_wide |>mutate(cluster = clusters) |>relocate(cluster, .after = country)head(aggr_growth_clust)
prepare for plot
aggr_growth_clustaggr_growth_clust_long <- aggr_growth_clust |>pivot_longer(cols =`1961`:`2021`, # Columns to pivot (year columns)names_to ="year", # New column for yearsvalues_to ="avg_growth") |># New column for avg_growth valuesgroup_by(country)head(aggr_growth_clust_long)
Plot growth avg of each country aggregatied year (5)
ggplot(aggr_growth_clust_long, aes(as.numeric(year), avg_growth, group_by = country)) +geom_line(alpha =0.2) +scale_x_continuous(breaks =seq(1960, 2022, by =20)) +labs(title ="Average Growth Over the aggregated years (5) by country", x ="Year", y ="Average growth over aggregated years (5) foreach country")
Angola, Burundi, Benin, Burkina Faso, Bangladesh, Bahrain, Bahamas, The, Belize, Bolivia, Brazil, Brunei Darussalam, Bhutan, Botswana, Central African Republic, Cote d’Ivoire, Cameroon, Congo, Dem. Rep., Congo, Rep., Colombia, Comoros, Cabo Verde, Costa Rica, Dominican Republic, Algeria, Ecuador, Egypt, Arab Rep., Eritrea, Ethiopia, Micronesia, Fed. Sts., Gabon, Ghana, Guinea, Gambia, The, Guinea-Bissau, Guatemala, Honduras, Haiti, Indonesia, India, Iran, Islamic Rep., Iraq, Israel, Kenya, Kyrgyz Republic, Cambodia, Kiribati, Lao PDR, Lebanon, Liberia, Libya, Lesotho, Morocco, Madagascar, Maldives, Mexico, Mali, Mongolia, Mozambique, Mauritania, Malawi, Malaysia, Namibia, Niger, Nigeria, Nicaragua, Nepal, Nauru, Pakistan, Panama, Peru, Philippines, Palau, Papua New Guinea, Paraguay, Rwanda, Saudi Arabia, Sudan, Senegal, Singapore, Solomon Islands, Sierra Leone, Somalia, South Sudan, Sao Tome and Principe, Suriname, Eswatini, Syrian Arab Republic, Chad, Togo, Tajikistan, Turkmenistan, Timor-Leste, Tunisia, Turkiye, Tanzania, Uganda, Uzbekistan, Venezuela, RB, Viet Nam, Vanuatu, Yemen, Rep., South Africa, Zambia and Zimbabwe
Albania, Argentina, Armenia, Antigua and Barbuda, Australia, Austria, Azerbaijan, Belgium, Belarus, Canada, Switzerland, Chile, China, Cuba, Cyprus, Dominica, Denmark, Spain, Fiji, France, United Kingdom, Grenada, Guyana, Ireland, Iceland, Jamaica, Japan, Kazakhstan, St. Kitts and Nevis, Korea, Rep., St. Lucia, Liechtenstein, Sri Lanka, Luxembourg, Monaco, North Macedonia, Malta, Myanmar, Mauritius, Netherlands, Norway, New Zealand, Poland, Korea, Dem. People’s Rep., Russian Federation, El Salvador, San Marino, Slovak Republic, Slovenia, Sweden, Thailand, Tonga, Trinidad and Tobago, Tuvalu, Uruguay, United States, St. Vincent and the Grenadines and Samoa
Andorra and Djibouti
United Arab Emirates and Qatar
Bulgaria, Bosnia and Herzegovina, Barbados, Czechia, Germany, Estonia, Finland, Georgia, Greece, Croatia, Hungary, Italy, Lithuania, Latvia, Moldova, Marshall Islands, Montenegro, Portugal, Romania, Serbia and Ukraine
Equatorial Guinea, Jordan and Oman
Kuwait
Seychelles
facet plot
ggplot(aggr_growth_clust_long, aes(as.numeric(year), avg_growth, group_by = country)) +geom_line(alpha =0.2) +facet_wrap(~cluster) +# Create a panel for each clusterlabs(title ="Growth over Time per 5 year by Cluster",x ="Year",y ="Average growth over aggregated years (5) foreach cluster")